Issues in Automating Exploratory Data Analysis
نویسنده
چکیده
Exploration often plays a central role in the early stages of scientific inquiry. One can rarely produce models of complex, unfamiliar phenomena on first contact with data. One must interpret suggestive features of the data, observe patterns these features indicate, and generate hypotheses to explain the patterns. Successive steps through the process can lead gradually to a better understanding of underlying structure in the data [Hoaglin et al., 1983; Good, 1983]. Exploratory data analysis (EDA) encompasses wide range of statistical tools [Tukey, 1977]. Simple exploratory results include histograms that describe discrete and continuous variables, schematic plots that give general characterizations of relationships, partitions of relationships that distinguish different modes of behavior, functional simplification of low-dimensionality relationships, and two-way tables such as contingency tables. These partial descriptions give different views of the data for a more complete, refined picture of underlying patterns. EDA techniques have found application across a variety of scientific domains. In well-known studies, researchers have used EDA to attack problems in grouping corporations [Chen et al., 1974], reducing TELSTAR data [Mallows, 1983], testing validity of approaches to ozone reduction [Cleveland et al., 1974], and examining disease characteristics [Diaconis, 1985]. Our own use of EDA has led us to a better understanding of complex AI systems [Cohen, 1995; St. Amant and Cohen, 1994]. Viewed as search, EDA poses a difficult problem. Suppose we define search operators to be the menu operations in a statistics package. We now have a range of flexible, powerful possibilities available: arithmetic composition of variables, such as those used in function finding; model-based variable decomposition, as performed by linear regression; partitioning and clustering operations, such as those used in numerical and conceptual clustering systems; feature extraction operations such as statistical summaries; various transfor-
منابع مشابه
Preliminary System Design for an EDA Assistant
Data analysis plays a central role in our attempts to understand the behavior of complex systems. While research in both statistics and arti cial intelligence has addressed issues in the automation of later stages of analysis, such as theory generation, model selection, and experiment design [7], less attention has been given to initial exploration of data. Deriving structure from data is never...
متن کاملScalable Structure Discovery in Regression using Gaussian Processes
Automatic Bayesian Covariance Discovery (ABCD) in Lloyd et al. (2014) provides a framework for automating statistical modelling as well as exploratory data analysis for regression problems. However ABCD does not scale due to its O(N) running time for the kernel search. This is undesirable not only because the average size of data sets is growing fast, but also because there is potentially more ...
متن کاملAn architecture for the SPIN! spatial data mining platform
Geographic Information Systems (GIS) are widely used for analysing and visualizing georeferenced data. In the last few years, a new generation of Geographic Information Systems has emerged that extends the interactivity of dynamically generated maps, greatly enhancing visual exploratory data analysis ([1], [3], [6], [13]). While being an exciting development for automating cartography, these sy...
متن کاملA Multi-facetted Visual Analytics Tool for Exploratory Analysis of Human Brain and Function Datasets
Brain research typically requires large amounts of data from different sources, and often of different nature. The use of different software tools adapted to the nature of each data source can make research work cumbersome and time consuming. It follows that data is not often used to its fullest potential thus limiting exploratory analysis. This paper presents an ancillary software tool called ...
متن کاملStrategies for Stepping Out of Visiting-Related Challenges in Intensive Care Units: Descriptive Exploratory Study
Hospitalization in Intensive Care Units (ICUs) is a very stressful experience for the patient and family and their separation has not been confirmed in any of the studies. At present, ICU visiting is limited that makes several challenges. Therefore, this descriptive-exploratory study, aimed to explore strategies for overcoming the challenges of visiting This was a descriptive-exploratory quali...
متن کامل